Chapter 4: Hypothesis Testing

Author

Colin Foster

Welcome to the online content for Chapter 4!

As always, I’ll assume that you’ve already read up to this chapter of the book and worked through the online content for the previous chapters. If not, please do that first.

And, as before, click the ‘Run Code’ buttons below to execute the R code. Remember to wait until they say ‘Run Code’ before you press them. And be careful to run these boxes in order if later boxes depend on you having done other things previously.

Tail percentages again

There’s isn’t really much new to learn about R for this chapter.

In the online content for the previous chapter, we used the function pnorm to find tail probabilities (or percentages) inside Normal curves. We’ll continue to use this function to verify some of the values that I gave in Chapter 4.

For example, we asked the question, “How likely is it for a randomly selected human to be as short as 159 cm?” This was assuming that human heights were Normally distributed with a mean of 170 cm and a standard deviation of 6 cm.

We can find the answer by doing:

This is the 3.3% that I mentioned in the chapter.

To get the \(p\) value, we doubled this:

The 2* at the front means multiply by 2. So the \(p\) value was .067 or 6.7%.

Finding upper tails

We also wanted to answer the question, “How likely is it for a randomly selected horse to be as tall as 159 cm?” We took horse heights as being normally distributed with a mean of 149 cm and a standard deviation of 4 cm.

We could change the code like this:

The output seems like a very high percentage indeed! Are this many horses really taller than 159 cm?

The mistake is that the default setting for pnorm is always to tell us the percentage of observations smaller than the value that we put in (159 cm in this case). That was fine for the previous question, because we wanted to know the percentage of humans who were shorter than Chris. But, this time we’re getting the percentage of horses shorter than 159 cm, which is most of them. And to answer our question we want the percentage of horses that are taller than 159 cm.

To get the percentage that we want, we have to subtract our number from 1, which is equivalent to subtracting our percentage from 100%.

This gives us the 0.6% that I stated in the chapter, which we doubled to get the \(p\) value of 1.2%.

Because all Normal distributions are symmetrical, an upper tail will always be equal in size to the corresponding lower tail, which is why doubling will always work for this.

If you don’t want to bother doing the subtraction from 1, there’s actually an option for getting upper, rather than lower, tails:

This lower.tail=FALSE parameter says that we don’t want the lower tail, and is a convoluted way of saying that we want the upper tail instead! This parameter has a default value of TRUE, meaning that if you don’t mention lower-tail at all, R will assume that it’s true, and therefore give you the lower tail area. So, you only mention ‘lower-tail’ if you want to declare it false, so as to obtain the upper tail. R often operates like this, trying to guess what you want in common situations, so that you don’t have to mention things unless you want something that’s a bit unusual.